Capacity Planning for Vertical Search Engines

نویسندگان

  • Claudine Santos Badue
  • Jussara M. Almeida
  • Virgílio A. F. Almeida
  • Ricardo A. Baeza-Yates
  • Berthier A. Ribeiro-Neto
  • Artur Ziviani
  • Nivio Ziviani
چکیده

Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation, requiring a costly setup. An alternative is to develop a model of the search engine for predicting performance. However, this alternative is of interest only if its predictions are accurate. In this paper we propose a methodology for analyzing the performance of vertical search engines. Applying the proposed methodology, we present a capacity planning model based on a queueing network for search engines with a scale typically suitable for the needs of large corporations. The model is simple and yet reasonably accurate and, in contrast to previous work, considers the imbalance in query service times among homogeneous index servers. We discuss how we tune up the model and how we apply it to predict the impact on the query response time when parameters such as CPU and disk capacities are changed. This allows a manager of a vertical search engine to determine a priori whether a new configuration of the system might keep the query response under specified performance constraints.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Capacity Planning for Vertical Search Engines: An Approach Based on Coloured Petri Nets

This paper proposes a Colored Petri Net model capturing the behaviour of vertical search engines. In such systems a query submitted by a user goes through different stages and can be handled by three different kinds of nodes. The proposed model has a modular design that enables accommodation of alternative/additional search engine components. A performance evaluation study is presented to illus...

متن کامل

Meta Search Engines for Information Retrieval on Multiple Domains

A Web Search Engine searches for information in the World Wide Web. The number of web resources increases every day but the user is often unable to get the exact information due to the different page ranking techniques followed by individual Search Engines. Meta Search Engines solve this problem to a certain level by using more than one search engines. A Vertical Search Engine is used to provid...

متن کامل

Spidering and Filtering Web Pages for Vertical Search Engines

The size of the Web is growing exponentially. The number of indexable pages on the web has exceeded 2 billion (Lyman & Varian, 2000). It is more difficult for search engines to keep an up-to-date and comprehensive search index, resulting in low precision and low recall rates. Users often find it difficult to search for useful and high-quality information on the Web. Domain-specific search engin...

متن کامل

Comparison of Three Vertical Search Spiders

T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...

متن کامل

Comparison of Three Vertical Search

T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1006.5059  شماره 

صفحات  -

تاریخ انتشار 2010